Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.349
Filtrar
1.
BMC Bioinformatics ; 24(1): 396, 2023 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-37875804

RESUMO

BACKGROUND: Technical progress in computational hardware allows researchers to use new approaches for sequence alignment problems. For a given sequence, we usually use smaller subsequences (anchors) to find possible candidate positions within a reference sequence. We may create pairs ("position", "subsequence") for the reference sequence and keep all such records without compression, even on a budget computer. As sequences for new and reference genomes differ, the goal is to find anchors, so we tolerate differences and keep the number of candidate positions with the same anchors to a minimum. Spaced seeds (masks ignoring symbols at specific locations) are a way to approach the task. An ideal (full sensitivity) spaced seed should enable us to find all such positions subject to a given maximum number of mismatches permitted. RESULTS: Several algorithms to assist seed generation are presented. The first one finds all permitted spaced seeds iteratively. We observe specific patterns for the seeds of the highest weight. There are often periodic seeds with a simple relation between block size, length of the seed and read. The second algorithm produces blocks for periodic seeds for blocks of up to 50 symbols and up to nine mismatches. The third algorithm uses those lists to find spaced seeds for reads of an arbitrary length. Finally, we apply seeds to a real dataset and compare results for other popular seeds. CONCLUSIONS: PerFSeeB approach helps to significantly reduce the number of reads' possible alignment positions for a known number of mismatches. Lists of long, high-weight spaced seeds are available in Additional file 1. The seeds are best in weight compared to seeds from other papers and can usually be applied to shorter reads. Codes for all algorithms and periodic blocks can be found at https://github.com/vtman/PerFSeeB .


Assuntos
Algoritmos , Compressão de Dados , Alinhamento de Sequência , Análise de Sequência/métodos , Análise de Sequência de DNA/métodos , Software
2.
BMC Bioinformatics ; 24(1): 180, 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37131141

RESUMO

BACKGROUND: Large-scale multi-ethnic DNA sequencing data is increasingly available owing to decreasing cost of modern sequencing technologies. Inference of the population structure with such sequencing data is fundamentally important. However, the ultra-dimensionality and complicated linkage disequilibrium patterns across the whole genome make it challenging to infer population structure using traditional principal component analysis based methods and software. RESULTS: We present the ERStruct Python Package, which enables the inference of population structure using whole-genome sequencing data. By leveraging parallel computing and GPU acceleration, our package achieves significant improvements in the speed of matrix operations for large-scale data. Additionally, our package features adaptive data splitting capabilities to facilitate computation on GPUs with limited memory. CONCLUSION: Our Python package ERStruct is an efficient and user-friendly tool for estimating the number of top informative principal components that capture population structure from whole genome sequencing data.


Assuntos
Genoma , Software , Sequenciamento Completo do Genoma , Análise de Sequência/métodos , Análise de Componente Principal
3.
Artigo em Inglês | MEDLINE | ID: mdl-34928803

RESUMO

Multiple sequence alignment has been the traditional and well established approach of sequence analysis and comparison, though it is time and memory consuming. As the scale of sequencing data is increasing day by day, the importance of faster yet accurate alignment-free methods is on the rise. Several alignment-free sequence analysis methods have been established in the literature in recent years, which extract numerical features from genomic data to analyze sequences and also to estimate phylogenetic relationship among genes and species. Minimal Absent Word (MAW) is an effective concept for representing characteristics of a sequence in an alignment-free manner. In this study, we present CD-MAWS, a distance measure based on cosine of the angle between composition vectors constructed using minimal absent words, for sequence analysis in a computationally inexpensive manner. We have benchmarked CD-MAWS using several AFProject datasets, such as Fish mtDNA, E.coli, Plants, Shigella and Yersinia datasets, and found it to perform quite well. Applied on several other biological datasets such as mammal mtDNA, bacterial genomes and viral genomes, CD-MAWS resolved phylogenetic relationships similar to or better than state-of-the-art alignment-free methods such as Mash, Skmer, Co-phylog and kSNP3.


Assuntos
Algoritmos , Genômica , Animais , Filogenia , Genômica/métodos , Análise de Sequência/métodos , Escherichia coli , Genoma Bacteriano , Análise de Sequência de DNA/métodos , Mamíferos
4.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36477304

RESUMO

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Assuntos
Microbiota , Análise de Sequência , Genômica/métodos , Metagenoma , Metagenômica/métodos , Microbiota/genética , Software , Análise de Sequência/métodos
5.
J Mol Biol ; 434(15): 167586, 2022 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-35427634

RESUMO

Machine learning or deep learning models have been widely used for taxonomic classification of metagenomic sequences and many studies reported high classification accuracy. Such models are usually trained based on sequences in several training classes in hope of accurately classifying unknown sequences into these classes. However, when deploying the classification models on real testing data sets, sequences that do not belong to any of the training classes may be present and are falsely assigned to one of the training classes with high confidence. Such sequences are referred to as out-of-distribution (OOD) sequences and are ubiquitous in metagenomic studies. To address this problem, we develop a deep generative model-based method, MLR-OOD, that measures the probability of a testing sequencing belonging to OOD by the likelihood ratio of the maximum of the in-distribution (ID) class conditional likelihoods and the Markov chain likelihood of the testing sequence measuring the sequence complexity. We compose three different microbial data sets consisting of bacterial, viral, and plasmid sequences for comprehensively benchmarking OOD detection methods. We show that MLR-OOD achieves the state-of-the-art performance demonstrating the generality of MLR-OOD to various types of microbial data sets. It is also shown that MLR-OOD is robust to the GC content, which is a major confounding effect for OOD detection of genomic sequences. In conclusion, MLR-OOD will greatly reduce false positives caused by OOD sequences in metagenomic sequence classification.


Assuntos
Genômica , Metagenômica , Análise de Sequência , Algoritmos , Aprendizado de Máquina , Cadeias de Markov , Metagenoma , Metagenômica/métodos , Análise de Sequência/métodos
6.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35383372

RESUMO

With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.


Assuntos
Aprendizado de Máquina , Proteínas , DNA/genética , Humanos , Proteínas/química , RNA/genética , Análise de Sequência/métodos
7.
J Am Soc Mass Spectrom ; 33(3): 510-520, 2022 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-35157441

RESUMO

With the increased development of new RNA-based therapeutics, the need for robust analytical methods for confirming sequences and mapping modifications has accelerated. Characterizing modified ribonucleic acids using mass spectrometry is challenging because diagnostic fragmentation may be suppressed for modified nucleotides, thus hampering complete sequence coverage and the confident localization of modifications. Ultraviolet photodissociation (UVPD) has shown great potential for the characterization of nucleic acids due to extensive backbone fragmentation. Activated electron photodetachment dissociation (a-EPD) has also been used as an alternative to capitalize on the dominant charge-reduction pathway prevalent in UVPD, facilitate dissociation, and produce high abundances of fragment ions. Here, we compare higher-energy collisional activation (HCD), UVPD using 193 and 213 nm photons, and a-EPD for the top-down sequencing of modified nucleic acids, including methylated, phosphorothioate, and locked nucleic acid-modified DNA. The presence of these modifications alters the fragmentation pathways observed upon UVPD and a-EPD, and extensive backbone cleavage is observed that results in the production of fragment ions that retain the modifications and allow them to be pinpointed. LNA and 2'-O-methoxy phosphorothioate modifications caused a significant suppression of fragmentation for UVPD but not for a-EPD, whereas phosphorothioate bonds did not cause any significant suppression for either method. The incorporation of 2'-O-methyl modifications suppressed fragmentation of the antisense strand of patisiran, which resulted in some gaps in sequence coverage. However, UVPD provided the highest sequence coverage when compared to a-EPD.


Assuntos
Espectrometria de Massas/métodos , Oligorribonucleotídeos , Análise de Sequência/métodos , Elétrons , Oligorribonucleotídeos/análise , Oligorribonucleotídeos/química , Oligorribonucleotídeos/efeitos da radiação , Fotólise , Raios Ultravioleta
8.
PLoS One ; 17(1): e0261014, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35025877

RESUMO

High viral transmission in the COVID-19 pandemic has enabled SARS-CoV-2 to acquire new mutations that may impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from fifteen COVID-19 positive patients. ARTIC data covered 94.47% of the virus genome fraction in the positive control and patient samples. Variant analysis in the ARTIC data detected 217 mutations, including 209 single nucleotide variants (SNVs) and eight insertions & deletions. On the other hand, long-amplicon data detected 156 mutations, of which 80% were concordant with ARTIC data. Combined analysis of ARTIC + MRL data improved the genomic coverage to 97.03% and identified 214 high confidence mutations. The combined final set of 214 mutations included 203 SNVs, 8 deletions and 3 insertions. Analysis showed 26 SARS-CoV-2 lineage defining mutations including 4 known variants of concern K417N, E484K, N501Y, P618H in spike gene. Hybrid analysis identified 7 nonsynonymous and 5 synonymous mutations across the genome that were either ambiguous or not called in ARTIC data. For example, G172V mutation in the ORF3a protein and A2A mutation in Membrane protein were missed by the ARTIC assay. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.


Assuntos
Genoma Viral/genética , Genômica/métodos , SARS-CoV-2/genética , Sequenciamento Completo do Genoma/métodos , COVID-19/virologia , Humanos , RNA Viral/genética , Análise de Sequência/métodos
9.
Nat Microbiol ; 7(1): 108-119, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34907347

RESUMO

The global spread and continued evolution of SARS-CoV-2 has driven an unprecedented surge in viral genomic surveillance. Amplicon-based sequencing methods provide a sensitive, low-cost and rapid approach but suffer a high potential for contamination, which can undermine laboratory processes and results. This challenge will increase with the expanding global production of sequences across a variety of laboratories for epidemiological and clinical interpretation, as well as for genomic surveillance of emerging diseases in future outbreaks. We present SDSI + AmpSeq, an approach that uses 96 synthetic DNA spike-ins (SDSIs) to track samples and detect inter-sample contamination throughout the sequencing workflow. We apply SDSIs to the ARTIC Consortium's amplicon design, demonstrate their utility and efficiency in a real-time investigation of a suspected hospital cluster of SARS-CoV-2 cases and validate them across 6,676 diagnostic samples at multiple laboratories. We establish that SDSI + AmpSeq provides increased confidence in genomic data by detecting and correcting for relatively common, yet previously unobserved modes of error, including spillover and sample swaps, without impacting genome recovery.


Assuntos
Primers do DNA/normas , SARS-CoV-2/genética , Análise de Sequência/normas , COVID-19/diagnóstico , Primers do DNA/síntese química , Genoma Viral/genética , Humanos , Controle de Qualidade , RNA Viral/genética , Reprodutibilidade dos Testes , Análise de Sequência/métodos , Sequenciamento Completo do Genoma , Fluxo de Trabalho
10.
Mol Biol Rep ; 49(2): 951-969, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34773550

RESUMO

BACKGROUND: Using in silico sequence analyses, the present study aims to clone and express the gene-encoding sequence of a GH19 chitinase from Enterobacter sp. in Escherichia coli. METHODS AND RESULTS: The putative open reading frame of a GH19 chitinase from Enterobacter sp. strain EGY1 was cloned and expressed into pGEM®-T and pET-28a (+) vectors, respectively using a degenerate primer. The isolated nucleotide sequence (1821 bp, GenBank accession no.: MK533791.2) was translated to a chiRAM protein (606 amino acids, UniProt accession no.: A0A4D6J2L9). The in silico protein sequence analysis of chiRAM revealed a class I GH19 chitinase: an N-terminus signal peptide (Met1-Ala23), a catalytic domain (Val83-Glu347 and the catalytic triad Glu149, Glu171, and Ser218), a proline-rich hinge region (Pro414 -Pro450), a polycystic kidney disease protein motif (Gly 465-Ser 533), a C-terminus chitin-binding domain (Ala553- Glu593), and conserved class I motifs (NYNY and AQETGG). A three-dimensional model was constructed by LOMETS MODELLER of PDB template: 2dkvA (class I chitinase of Oryza sativa L. japonica). Recombinant chiRAM was overexpressed as inclusion bodies (IBs) (~ 72 kDa; SDS-PAGE) in 1.0 mM IPTG induced E. coli BL21 (DE3) Rosetta strain at room temperature 18 h after induction. Optimized expression yielded active chiRAM with 1.974 ± 0.0002 U/mL, on shrimp colloidal chitin (SCC), in induced E. coli BL21 (DE3) Rosetta cells growing in SB medium. LC-MS/MS identified a band of 72 kDa in the soluble fraction with a 52.3% coverage sequence exclusive to the GH19 chitinase of Enterobacter cloacae (WP_063869339.1). CONCLUSIONS: Although chiRAM of Enterobacter sp. was successfully cloned and expressed in E. coli with appreciable chitinase activity, future studies should focus on minimizing IBs to facilitate chiRAM purification and characterization.


Assuntos
Quitinases/genética , Enterobacter/genética , Sequência de Aminoácidos/genética , Domínio Catalítico/genética , Quitina/química , Quitina/genética , Quitina/metabolismo , Quitinases/metabolismo , Cromatografia Líquida/métodos , Clonagem Molecular/métodos , Simulação por Computador , Escherichia coli/genética , Fases de Leitura Aberta/genética , Proteínas de Plantas , Análise de Sequência/métodos , Espectrometria de Massas em Tandem/métodos
11.
Interdiscip Sci ; 14(1): 1-14, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34487327

RESUMO

The rapid advances in sequencing technology have led to an explosion of sequence data. Sequence alignment is the central and fundamental problem in many sequence analysis procedure, while local alignment is often the kernel of these algorithms. Usually, Smith-Waterman algorithm is used to find the best subsequence match between given sequences. However, the high time complexity makes the algorithm time-consuming. A lot of approaches have been developed to accelerate and parallelize it, such as vector-level parallelization, thread-level parallelization, process-level parallelization, and heterogeneous acceleration, but the current researches seem unsystematic, which hinders the further research of parallelizing the algorithm. In this paper, we summarize the current research status of parallel local alignments and describe the data layout in these work. Based on the research status, we emphasize large-scale genomic comparisons. By surveying some typical alignment tools' performance, we discuss some possible directions in the future. We hope our work will provide the developers of the alignment tool with technical principle support, and help researchers choose proper alignment tools.


Assuntos
Algoritmos , Software , Genômica , Alinhamento de Sequência , Análise de Sequência/métodos
12.
Arq. Inst. Biol. (Online) ; 89: e00302021, 2022. ilus, tab, graf
Artigo em Inglês | LILACS, VETINDEX | ID: biblio-1416780

RESUMO

Milk is an essential food, widely consumed by the population. Brazil is one of the world's largest producers of milk. Milk quality is influenced by several factors in all its stages of production. The aim of this study was to determine the microbiological profile of refrigerated and processed raw bovine milk from industries in Vale do Taquari, state of Rio Grande do Sul, Brazil, using metagenomic analysis. A total of six samples were collected, one of refrigerated raw milk from the tanker truck, one of pasteurized milk and one of milk sterilized by the ultra-high temperature (UHT) process, in each of the industries. The identification of the milk microbiota was performed by sequencing the 16S rRNA gene. The results show that refrigerated raw milk has a greater number of microorganisms, followed by pasteurized milk and sterilized milk, successively. Processed milk showed the presence of beneficial microorganisms such as Streptococcus thermophilus and Streptococcus macedonicus. Nevertheless, even UHT milk showed the presence of microorganisms considered harmful, such as the Bacillus cereus group, Aeromonas dhakensis, Enterobacter bacterium and Acinetobacter haemolyticus. Metagenomics is a valuable tool for the thorough evaluation of the milk microbiota in order to implement the processing stages in industries.


Assuntos
Análise de Sequência/métodos , Leite/microbiologia , Microbiota , Brasil , Alimentos Resfriados , Alimentos Crus/análise
13.
Genome Biol ; 22(1): 351, 2021 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-34963480

RESUMO

A growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present Cobolt, a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of Cobolt by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.


Assuntos
Análise de Sequência/métodos , Análise de Célula Única/métodos , Cromatina , Sequenciamento de Cromatina por Imunoprecipitação , Humanos , RNA-Seq
14.
Microbiol Spectr ; 9(3): e0100321, 2021 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-34756092

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in 2019 and has become a major global pathogen in an astonishingly short period of time. The emergence of SARS-CoV-2 has been notable due to its impacts on residents in long-term care facilities (LTCFs). LTCF residents tend to possess several risk factors for severe outcomes of SARS-CoV-2 infection, including advanced age and the presence of comorbidities. Indeed, residents of LTCFs represent approximately 40% of SARS-CoV-2 deaths in the United States. Few studies have focused on the prevalence and transmission dynamics of SARS-CoV-2 among LTCF staff during the early months of the pandemic, prior to mandated surveillance testing. To assess the prevalence and incidence of SARS-CoV-2 among LTCF staff, characterize the extent of asymptomatic infections, and investigate the genomic epidemiology of the virus within these settings, we sampled staff for 8 to 11 weeks at six LTCFs with nasopharyngeal swabs from March through June of 2020. We determined the presence and levels of viral RNA and infectious virus and sequenced 54 nearly complete genomes. Our data revealed that over 50% of infections were asymptomatic/mildly symptomatic and that there was a strongly significant relationship between viral RNA (vRNA) and infectious virus, prolonged infections, and persistent vRNA (4+ weeks) in a subset of individuals, and declining incidence over time. Our data suggest that asymptomatic SARS-CoV-2-infected LTCF staff contributed to virus persistence and transmission within the workplace during the early pandemic period. Genetic epidemiology data generated from samples collected during this period support that SARS-CoV-2 was commonly spread between staff within an LTCF and that multiple-introduction events were less common. IMPORTANCE Our work comprises unique data on the characteristics of SARS-CoV-2 dynamics among staff working at LTCFs in the early months of the SARS-CoV-2 pandemic prior to mandated staff surveillance testing. During this time period, LTCF residents were largely sheltering-in-place. Given that staff were able to leave and return daily and could therefore be a continued source of imported or exported infection, we performed weekly SARS-CoV-2 PCR on nasal swab samples collected from this population. There are limited data from the early months of the pandemic comprising longitudinal surveillance of staff at LTCFs. Our data reveal the surprisingly high level of asymptomatic/presymptomatic infections within this cohort during the early months of the pandemic and show genetic epidemiological analyses that add novel insights into both the origin and transmission of SARS-CoV-2 within LTCFs.


Assuntos
Teste para COVID-19/métodos , COVID-19/diagnóstico , COVID-19/epidemiologia , Hospitais , Assistência de Longa Duração , SARS-CoV-2/isolamento & purificação , Análise de Sequência/métodos , Adolescente , Adulto , Idoso , Infecções Assintomáticas/epidemiologia , COVID-19/virologia , Estudos de Coortes , Testes Diagnósticos de Rotina , Monitoramento Epidemiológico , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Pandemias , Filogenia , Prevalência , RNA Viral , SARS-CoV-2/classificação , SARS-CoV-2/genética , Manejo de Espécimes , Adulto Jovem
15.
Clin Transl Med ; 11(11): e589, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34842356

RESUMO

BACKGROUND: Few studies have discussed the contradictory roles of mutated-PI3Kα in HER2-positive (HER2+) breast cancer. Thus, we characterised the adaptive roles of PI3Kα mutations among HER2+ tumour progression. METHODS: We conducted prospective clinical sequencing of 1923 Chinese breast cancer patients and illustrated the clinical significance of PIK3CA mutations in locally advanced and advanced HER2+ cohort. A high-throughput PIK3CA mutations-barcoding screen was performed to reveal impactful mutation sites in tumour growth and drug responses. RESULTS: PIK3CA mutations acted as a protective factor in treatment-naïve patients; however, advanced/locally advanced patients harbouring mutated-PI3Kα exhibited a higher progressive disease rate (100% vs. 15%, p = .000053) and a lower objective response rate (81.7% vs. 95.4%, p = .0008) in response to trastuzumab-based therapy. Meanwhile, patients exhibiting anti-HER2 resistance had a relatively high variant allele fraction (VAF) of PIK3CA mutations; we defined the VAF > 12.23% as a predictor of poor anti-HER2 neoadjuvant treatment efficacy. Pooled mutations screen revealed that specific PI3Kα mutation alleles mediated own biological effects. PIK3CA functional mutations suppressed the growth of HER2+ cells, but conferred anti-HER2 resistance, which can be reversed by the PI3Kα-specific inhibitor BYL719. CONCLUSIONS: We proposed adaptive treatment strategies that the mutated PIK3CA and amplified ERBB2 should be concomitantly inhibited when exposing to continuous anti-HER2 therapy, while the combination of anti-HER2 and anti-PI3Kα treatment was not essential for anti-HER2 treatment-naïve patients. These findings improve the understanding of genomics-guided treatment in the different progressions of HER2+ breast cancer.


Assuntos
Neoplasias da Mama/tratamento farmacológico , Receptor ErbB-2/genética , Análise de Sequência/estatística & dados numéricos , Adaptação Fisiológica/efeitos dos fármacos , Adaptação Fisiológica/genética , Neoplasias da Mama/genética , Neoplasias da Mama/fisiopatologia , China , Estudos de Coortes , Feminino , Humanos , Estudos Prospectivos , Análise de Sequência/métodos
16.
PLoS Comput Biol ; 17(10): e1008950, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34613974

RESUMO

Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência/métodos , Software , Algoritmos , Bases de Dados Genéticas
17.
Genes (Basel) ; 12(7)2021 07 14.
Artigo em Inglês | MEDLINE | ID: mdl-34356090

RESUMO

Poland is the largest European producer of goose, while goose breeding has become an essential and still increasing branch of the poultry industry. The most frequently bred goose is the White Koluda® breed, constituting 95% of the country's population, whereas geese of regional varieties are bred in smaller, conservation flocks. However, a goose's genetic diversity is inaccurately explored, mainly because the advantages of the most commonly used tools are strongly limited in non-model organisms. One of the most accurate used markers for population genetics is single nucleotide polymorphisms (SNP). A highly efficient strategy for genome-wide SNP detection is genotyping-by-sequencing (GBS), which has been already widely applied in many organisms. This study attempts to use GBS in 12 conservative goose breeds and the White Koluda® breed maintained in Poland. The GBS method allowed for the detection of 3833 common raw SNPs. Nevertheless, after filtering for read depth and alleles characters, we obtained the final markers panel used for a differentiation analysis that comprised 791 SNPs. These variants were located within 11 different genes, and one of the most diversified variants was associated with the EDAR gene, which is especially interesting as it participates in the plumage development, which plays a crucial role in goose breeding.


Assuntos
Gansos/genética , Variação Genética/genética , Polimorfismo de Nucleotídeo Único/genética , Alelos , Criação de Animais Domésticos/métodos , Animais , Biomarcadores , Cruzamento/métodos , Genética Populacional/métodos , Genótipo , Técnicas de Genotipagem/métodos , Polônia , Análise de Sequência/métodos
18.
STAR Protoc ; 2(3): 100716, 2021 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-34401782

RESUMO

Diatoms are a major group of microalgae that initiate biofouling by surface colonization of human-made underwater structures; however, the involved regulatory pathways remain uncharacterized. Here, we describe a protocol for identifying and validating regulatory genes involved in the morphology shift of the model diatom species Phaeodactylum tricornutum during surface colonization. We also provide a workflow for characterizing biofouling transformants. By using this protocol, gene targets such as GPCR signaling genes could be identified and manipulated to turn off diatom biofouling. For complete information on the generation and use of this protocol, please refer to Fu et al. (2020).


Assuntos
Incrustação Biológica/prevenção & controle , Diatomáceas/genética , RNA/isolamento & purificação , Ascomicetos/genética , Ascomicetos/metabolismo , Diatomáceas/metabolismo , Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Microalgas/genética , Análise de Sequência/métodos
19.
Adv Mater ; 33(36): e2102349, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34309086

RESUMO

The in situ synthesis of biomolecules on glass surfaces for direct bioscreening can be a powerful tool in the fields of pharmaceutical sciences, biomaterials, and chemical biology. However, it is still challenging to 1) achieve this conventional multistep combinatorial synthesis on glass surfaces with small feature sizes and high yields and 2) develop a surface which is compatible with solid-phase syntheses, as well as the subsequent bioscreening. This work reports an amphiphilic coating of a glass surface on which small droplets of polar aprotic organic solvents can be deposited with an enhanced contact angle and inhibited motion to permit fully automated multiple rounds of the combinatorial synthesis of small-molecule compounds and peptides. This amphiphilic coating can be switched into a hydrophilic network for protein- and cell-based screening. Employing this in situ synthesis method, chemical space can be probed via array technology with unprecedented speed for various applications, such as lead discovery/optimization in medicinal chemistry and biomaterial development.


Assuntos
Vidro/química , Análise de Sequência/métodos , Técnicas de Síntese em Fase Sólida/métodos , Hidrogéis/química , Interações Hidrofóbicas e Hidrofílicas , Ligantes , Compostos Orgânicos/química , Peptídeos/química , Proteínas/química , Solventes/química , Propriedades de Superfície , Molhabilidade
20.
J Med Virol ; 93(12): 6828-6832, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34314048

RESUMO

A cluster of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections was found in a cargo ship under repair in Zhoushan, China. Twelve of 20 crew members were identified as SARS-CoV-2 positive. We analyzed four sequences and identified them all in the Delta branch emerging from India with 7-8 amino acid mutation sites in the spike protein.


Assuntos
COVID-19/virologia , SARS-CoV-2/genética , China , Genoma Viral/genética , Humanos , Índia , Filogenia , Análise de Sequência/métodos , Navios/métodos , Glicoproteína da Espícula de Coronavírus/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...